optimal map
Collision-based Dynamics for Multi-Marginal Optimal Transport
Inspired by the Boltzmann kinetics, we propose a collision-based dynamics with a Monte Carlo solution algorithm that approximates the solution of the multi-marginal optimal transport problem via randomized pairwise swapping of sample indices. The computational complexity and memory usage of the proposed method scale linearly with the number of samples, making it highly attractive for high-dimensional settings. In several examples, we demonstrate the efficiency of the proposed method compared to the state-of-the-art methods.
Families of costs with zero and nonnegative MTW tensor in optimal transport
We compute explicitly the MTW tensor (or cross curvature) for the optimal transport problem on $\mathbb{R}^n$ with a cost function of form $\mathsf{c}(x, y) = \mathsf{u}(x^{\mathfrak{t}}y)$, where $\mathsf{u}$ is a scalar function with inverse $\mathsf{s}$, $x^{\ft}y$ is a nondegenerate bilinear pairing of vectors $x, y$ belonging to an open subset of $\mathbb{R}^n$. The condition that the MTW-tensor vanishes on null vectors under the Kim-McCann metric is a fourth-order nonlinear ODE, which could be reduced to a linear ODE of the form $\mathsf{s}^{(2)} - S\mathsf{s}^{(1)} + P\mathsf{s} = 0$ with constant coefficients $P$ and $S$. The resulting inverse functions include {\it Lambert} and {\it generalized inverse hyperbolic\slash trigonometric} functions. The square Euclidean metric and $\log$-type costs are equivalent to instances of these solutions. The optimal map for the family is also explicit. For cost functions of a similar form on a hyperboloid model of the hyperbolic space and unit sphere, we also express this tensor in terms of algebraic expressions in derivatives of $\mathsf{s}$ using the Gauss-Codazzi equation, obtaining new families of strictly regular costs for these manifolds, including new families of {\it power function costs}. We analyze the $\sinh$-type hyperbolic cost, providing examples of $\mathsf{c}$-convex functions and divergence.
The troublesome kernel -- On hallucinations, no free lunches and the accuracy-stability trade-off in inverse problems
Gottschling, Nina M., Antun, Vegard, Hansen, Anders C., Adcock, Ben
Methods inspired by Artificial Intelligence (AI) are starting to fundamentally change computational science and engineering through breakthrough performances on challenging problems. However, reliability and trustworthiness of such techniques is becoming a major concern. In inverse problems in imaging, the focus of this paper, there is increasing empirical evidence that methods may suffer from hallucinations, i.e., false, but realistic-looking artifacts; instability, i.e., sensitivity to perturbations in the data; and unpredictable generalization, i.e., excellent performance on some images, but significant deterioration on others. This paper presents a theoretical foundation for these phenomena. We give a mathematical framework describing how and when such effects arise in arbitrary reconstruction methods, not just AI-inspired techniques. Several of our results take the form of `no free lunch' theorems. Specifically, we show that (i) methods that overperform on a single image can wrongly transfer details from one image to another, creating a hallucination, (ii) methods that overperform on two or more images can hallucinate or be unstable, (iii) optimizing the accuracy-stability trade-off is generally difficult, (iv) hallucinations and instabilities, if they occur, are not rare events, and may be encouraged by standard training, (v) it may be impossible to construct optimal reconstruction maps for certain problems. Our results trace these effects to the kernel of the forward operator whenever it is nontrivial, but also extend to the case when the forward operator is ill-conditioned. Based on these insights, our work aims to spur research into new ways to develop robust and reliable AI-inspired methods for inverse problems in imaging.
Generative modeling of time-dependent densities via optimal transport and projection pursuit
Botvinick-Greenhouse, Jonah, Yang, Yunan, Maulik, Romit
Such processes are visible in many applications ranging from geoscience, bioscience, and engineering to computer vision. In particular, deep learning algorithms, such as neural network parameterized normalizing flows, neural ordinary differential equations, diffusion models, and generative adversarial networks, have shown remarkable advances in learning and enabling rapid sampling from these stochastic processes. Such advances are further pronounced for very high-dimensional systems where classical methods are seen to saturate their effectiveness. However, the effective use of deep learning is frequently hampered by difficulties associated with computational cost as well as optimal hyperparameter selection. In this article, we propose a novel approach based on projection-pursuit optimal transport, which learns to sample from the densities of time-varying stochastic processes. It is competitive (both in terms of computational cost and accuracy) with a state-of-the-art deep learning algorithm (given by the neural spline flow). Crucially, our proposed method requires few hyperparameter choices by the user in contrast with most neural network-based methodologies. Thus, our main contributions to this work are as follows: 1. We implement a projection-pursuit optimal transport-based method to learn maps between time-varying densities from snapshots of particles sampled from these densities.
Adversarial Computation of Optimal Transport Maps
Leygonie, Jacob, She, Jennifer, Almahairi, Amjad, Rajeswar, Sai, Courville, Aaron
Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminator's objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.
Large-Scale Optimal Transport and Mapping Estimation
Seguy, Vivien, Damodaran, Bharath Bhushan, Flamary, Rรฉmi, Courty, Nicolas, Rolet, Antoine, Blondel, Mathieu
This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the mapping outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling.
Tractable Fully Bayesian Inference via Convex Optimization and Optimal Transport Theory
Kim, Sanggyun, Mesa, Diego, Ma, Rui, Coleman, Todd P.
We consider the problem of transforming samples from one continuous source distribution into samples from another target distribution. We demonstrate with optimal transport theory that when the source distribution can be easily sampled from and the target distribution is log-concave, this can be tractably solved with convex optimization. We show that a special case of this, when the source is the prior and the target is the posterior, is Bayesian inference. Here, we can tractably calculate the normalization constant and draw posterior i.i.d. samples. Remarkably, our Bayesian tractability criterion is simply log concavity of the prior and likelihood: the same criterion for tractable calculation of the maximum a posteriori point estimate. With simulated data, we demonstrate how we can attain the Bayes risk in simulations. With physiologic data, we demonstrate improvements over point estimation in intensive care unit outcome prediction and electroencephalography-based sleep staging.